-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIP-4844: Free the blobs #3244
EIP-4844: Free the blobs #3244
Conversation
This PR reintroduces and further decouples blocks and blobs in EIP-4844, so as to improve network and processing performance. Block and blob processing, for the purpose of gossip validation, are independent: they can both be propagated and gossip-validated in parallel - the decoupled design allows 4 important optimizations (or, if you are so inclined, removes 4 unnecessary pessimizations): * Blocks and blobs travel on independent meshes allowing for better parallelization and utilization of high-bandwidth peers * Re-broadcasting after validation can start earlier allowing more efficient use of upload bandwidth - blocks for example can be rebroadcast to peers while blobs are still being downloaded * bandwidth-reduction techniques such as per-peer deduplication are more efficient because of the smaller message size * gossip verification happens independently for blocks and blobs, allowing better sharing / use of CPU and I/O resources in clients With growing block sizes and additional blob data to stream, the network streaming time becomes a dominant factor in propagation times - on a 100mbit line, streaming 1mb to 8 peers takes ~1s - this process is repeated for each hop in both incoming and outgoing directions. This design in particular sends each blob on a separate subnet, thus maximising the potential for parallelisation and providing a natural path for growing the number of blobs per block should the network be judged to be able to handle it. Changes compared to the current design include: * `BlobsSidecar` is split into individual `BlobSidecar` containers - each container is signed individually by the proposer * the signature is used during gossip validation but later dropped. * KZG commitment verification is moved out of the gossip pipeline and instead done before fork choice addition, when both block and sidecars have arrived * clients may verify individual blob commitments earlier * more generally and similar to block verification, gossip propagation is performed solely based on trivial consistency checks and proposer signature verification * by-root blob requests are done per-blob, so as to retain the ability to fill in blobs one-by-one assuming clients generally receive blobs from gossip * by-range blob requests are done per-block, so as to simplify historical sync * range and root requests are limited to `128` entries for both blocks and blobs - practically, the current higher limit of `1024` for blocks does not get used and keeping the limits consistent simplifies implementation - with the merge, block sizes have grown significantly and clients generally fetch smaller chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
I did a first pass. Looks good, a number of small suggestions and questions
specs/eip4844/beacon-chain.md
Outdated
@@ -44,6 +44,7 @@ This upgrade adds blobs to the beacon chain as part of EIP-4844. This is an exte | |||
| Name | SSZ equivalent | Description | | |||
| - | - | - | | |||
| `VersionedHash` | `Bytes32` | | | |||
| `BlobIndex` | `uint64` | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why define this here? Just because we don't have a precedent for defining types in the p2p? Seems a bit more natural to define it next to the BlobSideCar
def in p2p or validator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's where we've defined it for other general "index" types (like the ValidatorIndex
) - not an opinion, happy to move
specs/eip4844/fork-choice.md
Outdated
|
||
```python | ||
def is_data_available(slot: Slot, beacon_block_root: Root, blob_kzg_commitments: Sequence[KZGCommitment]) -> bool: | ||
# `retrieve_blobs_sidecar` is implementation and context dependent, raises an exception if not available. | ||
# Note: the p2p network does not guarantee sidecar retrieval outside of `MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS` | ||
sidecar = retrieve_blobs_sidecar(slot, beacon_block_root) | ||
sidecars = retrieve_blob_sidecars(slot, beacon_block_root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we have two different message deliveries -- either in separate sidecars or in an aggregate sidecar -- which makes this retrieval and validation very context dependent. Did you get it from gossip or did you get it in an aggregate side car via byRange.
My preference would be to make this retrieve_aggregate_sidecar
and assume that if you got it from local gossip, you can bundle it into the aggregate structure. But this depends on which type of proof (individual or aggregate) is sent in the individual sidecars
Or simply that we abstract the details here such that these fork-choice functions don't care how the blobs were received
cc: @asn-d6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or simply that we abstract the details here such that these fork-choice functions don't care how the blobs were received
This would be my preference - the design I'm aiming for is to allow the client to validate some commitments individually and some in aggregate, but still mandating that all have passed some validation, before passing the block to fork choice - we could see the aggregation as a pure optimization (assuming aggregate+verify is faster than multiple-single-verifies) - this is the case for "batch verification" of attestations for example.
Actually, that reminds me: we use "batch" to denote any form of aggregation done for the purpose of performance whereas "aggregate" is done for spec reasons. Named that way, we can treat everything up to here as individual blobs without aggregation, but also expose batched-validation-functions in relevant kzg libraries (again, assuming this makes sense, crypto-wise - I do not have a clear picture of this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan at the moment is to provide individual KZG proofs as described here #3244 (comment) . We will still be able to batch-validate them in an amortized way if you have a bunch of blobs and proofs, but we won't use a special aggregated proof to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think there's a comma missing.
@arnetheduck since this PR is under active development and is still a draft, I'm not going to force-push to fix the conflicts now. Let me know if you need me to fix the conflicts. |
@hwwhww thanks, I pushed a merge commit fixing the renames :) |
Here is the simulation results on block + blobs dissemination delays (coupled v.s. decoupled). Decoupled approach definitely wins with that respect: Decoupling also appears to save global network traffic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent! getting close
FYI I'm looking into linter and test errors after fixing conflicts. 👀 |
@arnetheduck @asn-d6 @dankrad, it includes some |
@hwwhww excellent, thanks! I've pushed the commits here too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic work. just a couple of minor comments and a request for input from @michaelsproul
planning on merging tomorrow
specs/deneb/p2p-interface.md
Outdated
- _[IGNORE]_ The blob's block's parent (defined by `sidecar.block_parent_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue blocks for processing once the parent block is retrieved). | ||
- _[REJECT]_ The proposer signature, `signed_blob_sidecar.signature`, is valid with respect to the `sidecar.proposer_index` pubkey. | ||
- _[IGNORE]_ The sidecar is the only sidecar with valid signature received for the tuple `(sidecar.slot, sidecar.proposer_index, sidecar.index)`. | ||
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand now. It's using the proposer boost (and if low attestation weight so they know it will likely work).
Hmm, I'm okay putting this in here but maybe should merge the other orphaning spec.
Also the language probably needs to be "MAY attempt to orphan"
CC: @michaelsproul, the re-org god
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's using the proposer boost
I believe lighthouse calls it proposer reorg which in some ways is the opposite of proposer boost: this does not appear in the spec anywhere but rather is a lighthouse-only feature as of now (afair) in which lighthouse validators penalize nodes that publish late blocks.
The open question here is whether we need the collaboration of other nodes to pull this off: do they have to keep track of the duplicate blobs and "blacklist" the offending block as well or is it enough that the next proposer does so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposer re-orgs rely on the proposer-boost being larger than the weight of the attestations that the previous proposer managed to get (e.g. low because they released block late).
A few things
- I don't think this really belongs in the networking spec at all. This is a fork-choice capability and a validator option
- The obvious way to map that re-org functionality here is if the duplicate blobs reduce the ability of the block to garner attestations, leaving it re-orgable by proposer-boost weight at the next slot. I'm generally okay with making this into the fc/validator spec
- Doing this in a coordinated fashion (rather than just by the leader in clear times that they can) seems much more dangerous and likely to lead to network splits, missed attestations, and other bad things. So I'm not really open to "everyone can try to re-org it" unless we have a very clear how and a clear analysis on the chances, the impact, the potential losses from honest folks, etc.
My preference for this PR would be to remove this line and move this convo into and issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specs/deneb/p2p-interface.md
Outdated
- _[IGNORE]_ The blob's block's parent (defined by `sidecar.block_parent_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue blocks for processing once the parent block is retrieved). | ||
- _[REJECT]_ The proposer signature, `signed_blob_sidecar.signature`, is valid with respect to the `sidecar.proposer_index` pubkey. | ||
- _[IGNORE]_ The sidecar is the only sidecar with valid signature received for the tuple `(sidecar.slot, sidecar.proposer_index, sidecar.index)`. | ||
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a the valid blob to propagate. The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple. | |
-- If full verification of the blob fails at a later processing stage, clients MUST clear the blob from this "seen" cache so as to allow a potential valid blob to propagate. | |
-- The next block producer MAY orphan the block if they have observed multiple blobs signed by the proposer for the same "seen" tuple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to put this note about clearing the seen cache? We have the same problem with attestations and blocks today. IIUC this cache and IGNORE condition can slow down propagation, but IWANT/IHAVE and subsequently querying for message dependencies will still allow getting such messages even if the cache here hits.
Curious your perspective on this before merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same problem with attestations and blocks today.
In the past, we've considered it "risky" enough for the proposer to try to poison this seen cache because slashing conditions.
Now that I think about it, maybe the most simple solution, rather than playing these "first" games, is to remove the condition altogether - we want duplicates to propagate to a certain extent so they reach the validators that are in a position to enforce the rules.
I've opened #3257 which covers blocks and attestations.
The rule here is a bit more nuanced in that there's no actual slashing message to rely on - instead, we would rely on the "softer" enforcement of having the next honest proposer orphan such a history. Having written #3257, I think the best course of action is actually to remove the rule altogether and allow implementations to do spam prevention measures individually, for example by limiting to N observed tuples or IGNORE:ing duplicates outside of a much tighter 4s window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but IWANT/IHAVE and subsequently querying for message dependencies
IGNORE messages are also not given to the IWANT message cache, they're gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think we should clear the seen cache. In the event of a malicious proposer, message dependencies are the best path forward, imo. Gossip will be much slower for such a dishonest proposer, potentially leading to orphaning, but due to dependencies not a permanent split
note, getting a block from a peer doesn't necessarily mean they have the correct blobs, but getting an attestation or child block from a peer does. So these dependency messages become signals on who to download the blobs from and won't lead to unresolvable partitions.
Additionally, if you don't get any dependency messages, then the block will be orphaned. which was at the direct cause of the proposer trying to dos the blobs topic.
So I'd say, keep the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to keep the cache as we do in other message types and to take this to an issue for discussion. Instead of blocking on this to merge the bulk of the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
54d2559 - good point about the block likely being orphaned anyway - let's hope that is discouragement enough.
LGTM! |
specs/deneb/p2p-interface.md
Outdated
- _[IGNORE]_ The sidecar is from a slot greater than the latest finalized slot -- i.e. validate that `sidecar.slot > compute_start_slot_at_epoch(state.finalized_checkpoint.epoch)` | ||
- _[IGNORE]_ The blob's block's parent (defined by `sidecar.block_parent_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue blocks for processing once the parent block is retrieved). | ||
- _[REJECT]_ The proposer signature, `signed_blob_sidecar.signature`, is valid with respect to the `sidecar.proposer_index` pubkey. | ||
- _[IGNORE]_ The sidecar is the only sidecar with valid signature received for the tuple `(sidecar.slot, sidecar.proposer_index, sidecar.index)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing this, any thoughts on using (sidecar.block_root, sidecar.index)
. This is an implementation detail but I find that nicer to be able to share the same cache for serving BlobIdentifier
requests.
EDIT: this won't work on equivocation for blocks with different contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: this won't work on equivocation for blocks with different contents
this would be slashable, if done by the same proposer, so it seems fine - or is there another case you're thinking of?
* also, use root/index for uniqueness
Co-authored-by: g11tech <develop@g11tech.io>
sama as block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
This PR reintroduces and further decouples blocks and blobs in EIP-4844, so as to improve network and processing performance.
Block and blob processing, for the purpose of gossip validation, are independent: they can both be propagated and gossip-validated in parallel - the decoupled design allows 4 important optimizations (or, if you are so inclined, removes 4 unnecessary pessimizations):
This design in particular sends each blob on a separate subnet, thus maximising the potential for parallelisation and providing a natural path for growing the number of blobs per block.
Changes compared to the current design include:
BlobsSidecar
is split into individualBlobSidecar
containers - each container is signed individually by the proposer - the signature is used during gossip validation but later dropped.128
entries for both blocks and blobs - practically, the current higher limit of1024
for blocks does not get used and keeping the limits consistent simplifies implementation - with the merge, block sizes have grown significantly and clients generally fetch smaller chunks.https://hackmd.io/@arnetheduck/BJiJw012j contains further notes and information.
TODO: